Self-similarity of complex networks and hidden metric spaces 
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We demonstrate that the self-similarity of some scale- free networks with respect to a simple degree- 
thresholding renormalization scheme finds a natural interpretation in the assumption that network 
nodes exist in hidden metric spaces. Clustering, z.e., cycles of length three, plays a crucial role in this 
framework as a topological reflection of the triangle inequality in the hidden geometry. We prove that 
a class of hidden variable models with underlying metric spaces are able to accurately reproduce the 
self-similarity properties that we measured in the real networks. Our findings indicate that hidden 
geometries underlying these real networks are a plausible explanation for their observed topologies 
and, in particular, for their self-similarity with respect to the degree-based renormalization. 
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Self-similarity and scale invariance are traditionally 
known as characteristics of certain geometric objects, 
such as fractals 0], or of field theories describing sys- 
tem dynamics near critical points of phase transitions [2| . 
In these cases, objects or physical systems are intrinsi- 
cally embedded in metric spaces and distance scales in 
these spaces are natural scaling factors. In complex net- 
works [3], scale invariance is traditionally restricted to 
the scale-free property of the distributions of node de- 
grees, which in a vast majority of complex networks fol- 
low power laws of the form P(k) ~ ^ 7 G [2,3]. In 
search for more complete self-similar descriptions, sev- 
eral recent works Q introduced box-covering renormal- 
ization procedures, applied them to a few real networks, 
and found that certain networks, e.g., the Web and some 
biological networks, have finite fractal dimensions and 
degree distributions that remain invariant. 

Despite this promising progress, self-similarity and 
scale invariance of complex networks are still not well 
defined in a proper geometric sense. The reason is that 
many complex networks are not explicitly embedded in 
any physical space. As such, they lack any metric struc- 
ture, except the one that their graph abstractions induce 
by the collection of lengths of shortest paths between 
nodes. However, this observable topological metric is 
a poor source of length-based scaling factors. It does 
not have large lengths as it exhibits the small-world [5] 
or even ultrasmall-world [6] property, meaning that the 
characteristic path lengths grow not polynomially but 
(sub) logarithmically with the network size. The appar- 
ent absence of any other metric structures supports the 
common belief that complex networks cannot be invari- 
ant under geometric length scale transformations. 

In this paper, we undermine this belief by introducing 
the concept of hidden metric spaces as natural reservoirs 
of distance scales with respect to which scale-free net- 
works may be self-similar at all scales. At the formal 



level, hidden metric spaces are variations of hidden vari- 
ables [7|, ISl, l9| . Specifically, we assume that all network 
nodes, in addition to forming an observed network topol- 
ogy, reside in an underlying hidden metric space, meaning 
that for all pairs there are defined hidden distances sat- 
isfying the triangle inequality, which can be arbitrarily 
large. If hidden metric spaces do exist and play a role 
in shaping the observed network topologies, then strong 
clustering -the high concentration of triangles- arises as 
a natural consequence of the triangle inequality in the un- 
derlying geometry. Therefore, we focus on clustering as 
a potential connection between the observed topologies 
and hidden geometries. 

Consider the following degree-thresholding renormal- 
ization procedure, which produces a hierarchy of sub- 
graphs within a given graph G as illustrated in Fig. [H 
For each degree threshold kr = 0, 1, 2 . . ., first extract 
from G the subgraph G{kT) induced by nodes with de- 
grees k > kr- Second, for each node in G{kT), compute 
its internal degree /c^, i.e., the number of links that con- 
nect a given node to other nodes in G(/ct) and, finally, 
rescale /c^'s by the average internal degree {kiikr)) in 
Gikr) to obtain the rescaled quantity ki/ {ki{kT))- 

We applied this procedure to a few real complex net- 
works and found that their main topological characteris- 
tics -degree distributions, degree-degree correlations, and 
clustering- are self-similar with respect to the described 
procedure: both before and after renormalization with 
different values of threshold /c^, all these characteristics 
closely follow the same master curves describing the topo- 
logical structure of the whole subgraph hierarchy. We 
next randomized the observed topologies preserving the 
degree distribution as in [10], and found that their de- 
gree distributions and degree-degree correlations are still 
self-similar, but clustering is not. 

We provide examples of these empirical observations 
in Fig. [21 where we show the degree-dependent clustering 
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FIG. 1: Sketch of the degree-thresholding renormahzation. 

coefficient of the renormaHzed graphs Gikr) with differ- 
ent /ct's for the real and randomized topologies of the 
Border Gateway Protocol (BGP) map of the Internet 
at the Autonomous System level [11] and of the Pretty 
Good Privacy (PGP) social web of trust [12]. Both the 
BGP and the PGP are scale- free networks with exponents 
IBGP — 2.2 ± 0.2 and 7pgp = 2.5 ± 0.2. For brevity, we 
omit plots showing self-similarity of degree distributions 
and degree-degree correlations. Fig. [2] shows that even 
though the internal average degree grows signif- 

icantly for all networks, the average clustering coefficient 
of G(/ct) as a function of /ct, c(A:t), is nearly constant 
for the subgraphs of the real topologies, but it grows for 
their randomized counterparts. We also experimented 
with airport networks [13] and found that they exhibit 
qualitatively the same results as the Internet (BGP) and 
the social (PGP) networks. The BGP and PGP net- 
works are more interesting and challenging for our pur- 
poses since, as opposed to airport networks, they appear 
to be not explicitly embedded in any observable physical 
space |18|]. 

The high levels of clustering observed in real networks 
and their self-similarity under the degree thresholding 
renormahzation find a plausible explanation in the as- 
sumption that some metric structures underlay the ob- 
served network topologies. Indeed, under this assump- 
tion, clustering becomes a natural consequence of the 
triangle inequality in the metric space underneath. The 
fact that the randomized networks are not self-similar 
(cf. Fig. [2] b,d) also supports this observation. The ap- 
plied degree-preserving randomization is a process that 
involves pairs of nodes, whereas the triangle inequality 
concerns node triplets. Therefore, this randomization 
process cannot fully preserve the network properties im- 
posed by the triangle inequality. 

In the rest of the paper, we provide further evidence 
that this metric space explanation is indeed plausible. 
We do so by introducing a class of network models de- 
signed with the following three objectives: we want all 
nodes to exist in a metric space underlying the network 
topology; we want to control the degree distribution and 
clustering, so that we can generate scale-free graphs with 
strong clustering; and we want these graphs be small- 
world. We then find that the networks generated by 
our model reproduce all the self-similar effects that we 
have empirically observed in real networks. We empha- 



size that although there are models of scale-free networks 
embedded in Euclidean lattices |3] , none can simultane- 
ously reproduce all the effects discussed above. 

To define our model, we use the hidden variables for- 
malism [7J taking as hidden variables nodes' coordinates 
in a metric space. Each two nodes are located at a certain 
hidden metric distance and connected with a probabil- 
ity r, which relates the network topology to the underly- 
ing metric space. This probability depends on the metric 
distance d as r{d/dc), where dc is the characteristic dis- 
tance scale, i.e., a parameter that calibrates whether a 
given distance is short or long. Function r must be a 
positive integrable function of d e [0, oo). Consequently, 
nodes that are close to each other in the metric space are 
more likely to be connected in the graph. 

To engineer full control over the degree distribution, 
we link the characteristic distance scale dc to the topol- 
ogy of the network. We assume that dc is not a constant 
but depends on some topological properties of the nodes. 
Specifically, we assign an additional hidden random vari- 
able tz to each node, which corresponds to its expected 
degree. For simplicity, let our hidden metric space be a 
homogeneous and isotropic D-dimensional space. Then 
the choice [15] 

4(/^,A^0 cx {tztz'y^^ (1) 

guarantees that the average degree of nodes with variable 
hi is k{f<i) = n. Therefore, the distribution p{i<i) of this 
variable is asymptotically equal to the degree distribution 
P{k) in the resulting networks [7]. 

Eq. ^ has another important consequence: high de- 
gree nodes -hubs- are likely to be connected regardless 
their distance in the metric space because (hub, hub) 
is large. Low degree nodes, on the other hand, are con- 
nected only if they are close, whereas hubs are connected 
to low degree nodes if their distances are at most interme- 
diate. This pattern is typical of real networks embedded 
in metric spaces, such as the airport network. It is very 
likely that two major hubs like New York and London are 
connected, but it is very unlikely that two small airports 
are connected, unless they are close enough. 

We can now generate graphs with any degree distribu- 
tion by choosing an appropriate p{n). In particular, to 
generate scale-free graphs we set p{i<i) = (7 — l)/^o~^^~^' 
n > = (7 — 2) (/c)/ (7 — 1), 7 > 2, which after transfor- 
mations described in [7] yields the degree distribution 

P(fe) = (7-l).r ^^^ + \p'-°\ (2) 

where F is the incomplete gamma function. The asymp- 
totic behavior of this degree distribution for large k is 
P{k) ~ k~^ ^ i.e., the same as of p{hi). This result is in- 
dependent of the dimension of the hidden metric space 
and it is valid for any integrable connection probability 
of the form r{d/dc), where dc = dc{n^ n') is any function 
that factorizes in terms of n and n' . 
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FIG. 2: a-d: Degree-dependent clustering coefficient as a 
function of the rescaled internal degree for the Internet BGP 
map, the PGP web of trust, and their randomized versions, 
e: Average clustering coefficient as a function of the threshold 
degree kr for renormalized real networks and their random- 
ized counterparts, f: Internal average degree as a function of 
kr for the same networks. 

Given the freedom in the choice of the space dimen- 
sion and particular form of r, we hereafter consider the 
simplest one-dimensional model. We place nodes on a 
circle, S^, by assigning them a random variable rep- 
resenting their polar angle uniformly distributed in the 
interval [0,27r). The circle radius R grows linearly with 
the total number of nodes 27tR = N/S^ in order to 
keep the average density of nodes on the circle fixed to a 
constant value 6 that, without loss of generality, we set 
to S = 1. We specify the connection probability r such 
that we can control clustering in the generated graphs. 
Specifically, we define r, compliant with Eq. ([T]), as 



jinn' 



a > 1, 



(3) 



where d{9^ 9') is the geodesic distance over the circle, i.e., 
the metric distance d between nodes discussed above, 
and ji = is given by the normalization condi- 

tion {k) = N/{2TTf j p{H)r{e,K.-e' ,Hi')p{i<i')dedK.de'di<i' 
for large N . Parameter a controls clustering. The larger 
a, the more preferred short-distance connections; thus, 
the more triangles are formed. The exact dependence of 
the average clustering c on a is not important, but both 
our analytic and simulation results confirm that, as ex- 
pected, c ^ when a ^ 1, and that c converges to a 




FIG. 3: a-b: Scaling of the degree-dependent clustering coef- 
ficient in a modeled network using the connection probability 
given by Eq. © (7 = 2.5, a = 5.0, {k) = 6, N = 10^) and its 
randomization for different values of kr- Average clustering 
coefficient, c, and average internal degree, d, for the same 
networks, cf. Fig. [2] 

constant value, dependent on 7, when a ^ 00. We skip 
the details for brevity; they will be published elsewhere. 

We next check whether our synthetic graphs are small 
worlds, as spatially embedded networks do not always 
have this property 0- To this end, we compute the 
probability p{d^ that a node with hidden variable hz' 
has a neighbor with hidden variable at geodesic dis- 
tance d on . We use the hidden variables formalism 01 
and the result reads 



(4) 



p{d,i^\hi') = —p{n) 1 + 



finn' 



Integration over n gives the probability that a node has 
a neighbor at distance d. For large this function scales 
as p{d\i<i') ^ d~^ when a < 7 — 1 and p{d\n') ~ d^~^ 
when a > 7 — 1. The network is a small world when 
the average hidden distance to nearest neighbors d{f<i') = 
/ xp{x\K.')dx diverges in the large- and, consequently, 
large- limit. Such divergence indicates the presence of 
links connecting nodes located at all, including arbitrarily 
large, hidden distance scales. This average distance d{hz') 
diverges when the exponent of the asymptotic form of 
p{d\i<i') for large d is smaller than 2, i.e., when either 
l<a<2or2<7<3, or both. Real scale- free networks 
have values of 7 between 2 and 3, meaning that they 
correspond to the class of small-world networks in our 
model, regardless of the value of a. 

Finally, we analyze the self-similarity of networks pro- 
duced by our model. Thanks to the proportionality be- 
tween k{hz) and z^, we can work in the A>:-space instead of 
the /c-space. Since p{tz) is a power law, the distribution 
of for nodes in the subgraph G{hZT) (with hz > Kt)^ 
p(^I^t), is given by the same power-law function but 
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starting at instead of hcq. Therefore, the average of Hi 
within G{hZT) is given by {tz{hZT)) = {k)tZT' The number 
of nodes in G{nT) is Nn}^^ ^ so that their density in 
is S{tZT) = Stz]r^ ^ where S is the density of the original 
graph G. Since the connection probabihty does not de- 
pend on /^T, subgraphs G{nT) are rephcas of G after the 
fohowing renormahzation of the parameters: 

— > tvT ; ^ — ^ SK,]r^ . (5) 

In particular, the average degree of nodes with hidden 
variable in G{hZT)^ ki{hz\hZT)^ and the average degree of 
all nodes in G{nT)^ {ki{nT))^ are 

ki{n\i<iT) = n and {ki{nT)) = {k). (6) 

The degree distribution in G^kt) is given by the same 
analytic expression Eq. ([2]), except that {ki{f<iT)) from 
Eq. replaces {k) in Eq. 

The exact expression for clustering in G{nT) is rather 
long and we omit it here for brevity, but it can be easily 
derived from results in [7]. What matters for our analy- 
sis is that the clustering coefficient of nodes with hidden 
variable n in G{nT) satisfies c{n\i<iT) = /{hz/hzr)^ where 
/ is some function. It follows that the average cluster- 
ing coefficient in G{hZT)^ c{nT) = J^^ dhzp{hz\hZT)c{hz\hZT)^ 
takes a finite value independent of a^t- Using once again 
the proportionality between the A^-space and /c-space and 
the scaling relations in Eq. (|6]), we conclude that 

cihlkr) « f{ki/e-^) = f{h/{h{kT))), (7) 

where we use the symbol to account for fluctuations 
of degrees of nodes with small values of n. 

In Fig. [31 we show that our simulation results match 
perfectly the scaling of clustering predicted by Eq. ([7j). 
The same figure demonstrates that the clustering-related 
self-similarity properties of our modeled networks and 
their randomizations are qualitatively the same as of real 
networks in Fig. [2l We emphasize that the self-similarity 
of clustering observed in our model does not depend ei- 
ther on the dimension of the hidden space or on the final 
form of r. The only requirements are that nodes are 
located in a metric space and connected under the inte- 
grable connection probability r{d/dc) with dc given by 
Eq. ([1]), and that the degree distribution is scale- free. 

In summary, hidden geometries underlying the ob- 
served topologies of some complex networks appear 
to provide a simple and natural explanation of their 
degree-renormalization self-similarity. If we take the 
most generic interpretation of hidden distances as mea- 
sures of either structural or functional similarity between 
nodes [17? ], and admit that more similar nodes are more 
likely to be connected, then the hidden and observable 
forms of transitivity become clearly related. At the hid- 
den geometry layer, this transitivity is the transitivity 
of "being close," while at the observed topology layer, it 



is the transitivity of "being connected." In future work, 
hidden metric spaces may find far-reaching applications 
such as the design of efficient routing and searching al- 
gorithms for communication and social networks. Also 
worth pursuing is studying the relationship between frac- 
tality in [4] and self-similarity under our renormahzation 
procedure. 
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